171 research outputs found
Identification de facteurs génétiques impliqués dans les troubles du spectre autistique et de la dyslexie
Les troubles du spectre autistique (TSA) touchent approximativement 1% de la population gĂ©nĂ©rale. Ces troubles se caractĂ©risent par un dĂ©ficit de la communication sociale, ainsi que des comportements stĂ©rĂ©otypĂ©s et des intĂ©rĂȘts restreints. Plusieurs gĂšnes impliquĂ©s dans le dĂ©terminisme des TSA ont Ă©tĂ© identifiĂ©s, comme par exemple les gĂšnes NLGN3-4X, NRXN1-3 et SHANK1-3. Au cours des annĂ©es prĂ©cĂ©dentes, les TSA ont Ă©tĂ© considĂ©rĂ©s comme un ensemble complexe de troubles monogĂ©niques. Cependant, les Ă©tudes rĂ©centes du gĂ©nome complet suggĂšrent la prĂ©sence de gĂšnes modificateurs ( multiple hits model ). La dyslexie est caractĂ©risĂ©e par un trouble dans l apprentissage de la lecture et de l Ă©criture qui touche 5- 15% de la population gĂ©nĂ©rale. Les facteurs gĂ©nĂ©tiques impliquĂ©s restent pour l instant inconnus car seuls des gĂšnes ou loci candidats ont Ă©tĂ© identifiĂ©s. Mon projet de thĂšse avait pour objectif de poursuivre l identification des facteurs gĂ©nĂ©tiques impliquĂ©s dans les TSA et de dĂ©couvrir un premier facteur gĂ©nĂ©tique pour la dyslexie. Pour cela, deux types de populations ont Ă©tĂ© Ă©tudiĂ©s : d une part des patients atteints de TSA (N>600) provenant de France, de SuĂšde et des Iles Faroe, d autre part des patients atteints de dyslexie (N>200) provenant de France, en particulier une famille de 11 personnes atteintes sur 3 gĂ©nĂ©rations. J ai utilisĂ© Ă la fois la technologie des puces Ă ADN Illumina (600 K et 5M) et le sĂ©quençage complet du gĂ©nome humain pour effectuer des analyses de liaison et d association. Pour les TSA, grĂące aux analyses de CNVs, j ai pu identifier des gĂšnes candidats pour l autisme et confirmer l association de plusieurs gĂšnes synaptiques avec l autisme. En particulier, l Ă©tude d une population de 30 patients des Ăźles Faroe a pu confirmer l implication des gĂšnes NLGN1 et NRXN1 dans l autisme et identifier un nouveau gĂšne candidat IQSEC3. En parallĂšle, j ai explorĂ©PRRT2 localisĂ© en 16p11.2. PRRT2 code pour un membre du complexe SNARE synaptique qui permet la libĂ©ration des vĂ©sicules synaptiques. Je n ai pas pu mettre en Ă©vidence d association avec les TSA, mais j ai montrĂ© que ce gĂšne important pour certaines maladies neurologiques Ă©tait sous pression de sĂ©lection diffĂ©rente selon les populations. Pour la dyslexie, j ai effectuĂ© une analyse de liaison (mĂ©thode des lod-scores) pour une grande famille de 11 individus atteints sur trois gĂ©nĂ©rations. Cette Ă©tude a permis d identifier CNTNAP2 comme un gĂšne de vulnĂ©rabilitĂ© Ă la dyslexie. Cette dĂ©couverte est importante car ce mĂȘme gĂšne est aussi associĂ© aux TSA. Par contre, aucune des 20 variations rares dĂ©couvertes par le sĂ©quençage complet du gĂ©nome n est localisĂ©e dans les parties codantes du gĂšne. Plusieurs variations localisĂ©es dans des rĂ©gions rĂ©gulatrices sont candidates. En conclusion, les rĂ©sultats de ma thĂšse ont permis d identifier des gĂšnes candidats pour les TSA, de confirmer le rĂŽle des gĂšnes synaptiques dans ce trouble, de montrer pour la premiĂšre fois grĂące Ă une analyse de liaison le rĂŽle de CNTNAP2 dans la dyslexie.Autism spectrum disorders (ASD) affect 1% of the general population. These disorders are characterized by deficits in social communication as well as stereotyped behaviors and restricted interests. Several genes involved in the determination of ASD have been identified, such as NLGN3-4, NRXN1-3 and SHANK1-3. In the previous years, ASD have been considered as a complex set of monogenic disorders. Recent studies on the complete genome nevertheless suggest the presence of modifier genes ("multiple hits model"). Dyslexia is characterized by difficulties in learning to read and write. It affects 5-15 % of the general population. Genetic factors involved remain unknown. Only candidate genes or loci have been identified. My thesis had two main objectives: pursuing the identification of genetic factors involved in ASD, and discovering a first genetic factor for dyslexia. I therefore studied two types of populations: on the one hand a group of patients with ASD (N > 600) from France, Sweden and the Faroe Islands, and on the other hand another group of patients with dyslexia (N > 200) from France, and more specifically a family of 11 people followed over 3 generations. I used both Illumina microarrays technology (600K and 5M) and the complete human genome sequencing to conduct linkage and association analyses. Regarding ASD, CNVs (copy number variants) analyses allowed me to confirm the association of several synaptic genes with autism and to identify new candidate genes. In particular, the study of a population of 30 patients from the Faroe Islands confirmed the involvement of NLGN1 and NRXN1 genes in autism and identified a new candidate gene, IQSEC3. At the same time, I explored PRRT2 located in 16p11.2. PRRT2 encodes a member of the synaptic SNARE complex that allows the release of synaptic vesicles. I have not been able to demonstrate any association with ASD, but I showed that this gene, which is important for some neurological diseases, was under different selection pressures according to the population considered. Regarding dyslexia, I realized a linkage analysis (lod-score method) for a large family of 11 individuals, with three generations affected. This study identified the CNTNAP2 gene as a vulnerability factor for dyslexia. This finding is important because this gene is also associated with ASD. Nevertheless, none of the 20 rare variations discovered by whole genome sequencing is localized in the coding parts of the gene. Only several variations localized in regulatory regions are robust candidates. To conclude, my findings enabled the identification of new candidate genes for ASD, the confirmation of the role of synaptic genes in this disorder, and the highlight for the first time of the role of CNTNAP2 in dyslexia through linkage analysis.PARIS5-Bibliotheque electronique (751069902) / SudocSudocFranceF
Geodesic Sinkhorn: optimal transport for high-dimensional datasets
Understanding the dynamics and reactions of cells from population snapshots
is a major challenge in single-cell transcriptomics. Here, we present Geodesic
Sinkhorn, a method for interpolating populations along a data manifold that
leverages existing kernels developed for single-cell dimensionality reduction
and visualization methods. Our Geodesic Sinkhorn method uses a heat-geodesic
ground distance that, as compared to Euclidean ground distances, is more
accurate for interpolating single-cell dynamics on a wide variety of datasets
and significantly speeds up the computation for sparse kernels. We first apply
Geodesic Sinkhorn to 10 single-cell transcriptomics time series interpolation
datasets as a drop-in replacement for existing interpolation methods where it
outperforms on all datasets, showing its effectiveness in modeling cell
dynamics. Second, we show how to efficiently approximate the operator with
polynomial kernels allowing us to improve scaling to large datasets. Finally,
we define the conditional Wasserstein-average treatment effect and show how it
can elucidate the treatment effect on single-cell populations on a drug screen.Comment: 15 pages, 5 tables, 5 figures, submitted to RECOMB 202
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Diffusion-based manifold learning methods have proven useful in
representation learning and dimensionality reduction of modern high
dimensional, high throughput, noisy datasets. Such datasets are especially
present in fields like biology and physics. While it is thought that these
methods preserve underlying manifold structure of data by learning a proxy for
geodesic distances, no specific theoretical links have been established. Here,
we establish such a link via results in Riemannian geometry explicitly
connecting heat diffusion to manifold distances. In this process, we also
formulate a more general heat kernel based manifold embedding method that we
call heat geodesic embeddings. This novel perspective makes clearer the choices
available in manifold learning and denoising. Results show that our method
outperforms existing state of the art in preserving ground truth manifold
distances, and preserving cluster structure in toy datasets. We also showcase
our method on single cell RNA-sequencing datasets with both continuum and
cluster structure, where our method enables interpolation of withheld
timepoints of data. Finally, we show that parameters of our more general method
can be configured to give results similar to PHATE (a state-of-the-art
diffusion based manifold learning method) as well as SNE (an
attraction/repulsion neighborhood based method that forms the basis of t-SNE).Comment: 31 pages, 13 figures, 10 table
Manifold Interpolating Optimal-Transport Flows for Trajectory Inference
We present a method called Manifold Interpolating Optimal-Transport Flow
(MIOFlow) that learns stochastic, continuous population dynamics from static
snapshot samples taken at sporadic timepoints. MIOFlow combines dynamic models,
manifold learning, and optimal transport by training neural ordinary
differential equations (Neural ODE) to interpolate between static population
snapshots as penalized by optimal transport with manifold ground distance.
Further, we ensure that the flow follows the geometry by operating in the
latent space of an autoencoder that we call a geodesic autoencoder (GAE). In
GAE the latent space distance between points is regularized to match a novel
multiscale geodesic distance on the data manifold that we define. We show that
this method is superior to normalizing flows, Schr\"odinger bridges and other
generative models that are designed to flow from noise to data in terms of
interpolating between populations. Theoretically, we link these trajectories
with dynamic optimal transport. We evaluate our method on simulated data with
bifurcations and merges, as well as scRNA-seq data from embryoid body
differentiation, and acute myeloid leukemia treatment.Comment: Presented at NeurIPS 2022, 24 pages, 7 tables, 14 figure
Simulation-free Schr\"odinger bridges via score and flow matching
We present simulation-free score and flow matching ([SF]M), a
simulation-free objective for inferring stochastic dynamics given unpaired
source and target samples drawn from arbitrary distributions. Our method
generalizes both the score-matching loss used in the training of diffusion
models and the recently proposed flow matching loss used in the training of
continuous normalizing flows. [SF]M interprets continuous-time stochastic
generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static
entropy-regularized optimal transport, or a minibatch approximation, to
efficiently learn the SB without simulating the learned stochastic process. We
find that [SF]M is more efficient and gives more accurate solutions to the
SB problem than simulation-based methods from prior work. Finally, we apply
[SF]M to the problem of learning cell dynamics from snapshot data. Notably,
[SF]M is the first method to accurately model cell dynamics in high
dimensions and can recover known gene regulatory networks from simulated data.Comment: A version of this paper appeared in the New Frontiers in Learning,
Control, and Dynamical Systems workshop at ICML 2023. Code:
https://github.com/atong01/conditional-flow-matchin
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling
technique, but they have been held back by limitations in their
simulation-based maximum likelihood training. We introduce the generalized
conditional flow matching (CFM) technique, a family of simulation-free training
objectives for CNFs. CFM features a stable regression objective like that used
to train the stochastic flow in diffusion models but enjoys the efficient
inference of deterministic flow models. In contrast to both diffusion models
and prior CNF training algorithms, CFM does not require the source distribution
to be Gaussian or require evaluation of its density. A variant of our objective
is optimal transport CFM (OT-CFM), which creates simpler flows that are more
stable to train and lead to faster inference, as evaluated in our experiments.
Furthermore, OT-CFM is the first method to compute dynamic OT in a
simulation-free way. Training CNFs with CFM improves results on a variety of
conditional and unconditional generation tasks, such as inferring single cell
dynamics, unsupervised image translation, and Schr\"odinger bridge inference.Comment: A version of this paper appeared in the New Frontiers in Learning,
Control, and Dynamical Systems workshop at ICML 2023. Title change from v1.
Code: https://github.com/atong01/conditional-flow-matchin
Investigating the contributions of circadian pathway and insomnia risk genes to autism and sleep disturbances
Sleep disturbance is prevalent in youth with Autism Spectrum Disorder (ASD). Researchers have posited that circadian dysfunction may contribute to sleep problems or exacerbate ASD symptomatology. However, there is limited genetic evidence of this. It is also unclear how insomnia risk genes identified through GWAS in general populations are related to ASD and common sleep problems like insomnia traits in ASD. We investigated the contribution of copy number variants (CNVs) encompassing circadian pathway genes and insomnia risk genes to ASD risk as well as sleep disturbances in children with ASD. We studied 5860 ASD probands and 2092 unaffected siblings from the Simons Simplex Collection (SSC) and MSSNG database, as well as 7509 individuals from two unselected populations (IMAGEN and Generation Scotland). Sleep duration and insomnia symptoms were parent reported for SSC probands. We identified 335 and 616 rare CNVs encompassing circadian and insomnia risk genes respectively. Deletions and duplications with circadian genes were overrepresented in ASD probands compared to siblings and unselected controls. For insomnia-risk genes, deletions (not duplications) were associated with ASD in both cohorts. Results remained significant after adjusting for cognitive ability. CNVs containing circadian pathway and insomnia risk genes showed a stronger association with ASD, compared to CNVs containing other genes. Circadian genes did not influence sleep duration or insomnia traits in ASD. Insomnia risk genes intolerant to haploinsufficiency increased risk for insomnia when duplicated. CNVs encompassing circadian and insomnia risk genes increase ASD liability with little to no observable impacts on sleep disturbances
Genome wide analysis of gene dosage in 24,092 individuals estimates that 10,000 genes modulate cognitive ability
International audienceGenomic copy number variants (CNVs) are routinely identified and reported back to patients with neuropsychiatric disorders, but their quantitative effects on essential traits such as cognitive ability are poorly documented. We have recently shown that the effect size of deletions on cognitive ability can be statistically predicted using measures of intolerance to haploinsufficiency. However, the effect sizes of duplications remain unknown. It is also unknown if the effect of multigenic CNVs are driven by a few genes intolerant to haploinsufficiency or distributed across tolerant genes as well. Here, we identified all CNVsâ>â50 kilobases in 24,092 individuals from unselected and autism cohorts with assessments of general intelligence. Statistical models used measures of intolerance to haploinsufficiency of genes included in CNVs to predict their effect size on intelligence. Intolerant genes decrease general intelligence by 0.8 and 2.6 points of intelligence quotient when duplicated or deleted, respectively. Effect sizes showed no heterogeneity across cohorts. Validation analyses demonstrated that models could predict CNV effect sizes with 78% accuracy. Data on the inheritance of 27,766 CNVs showed that deletions and duplications with the same effect size on intelligence occur de novo at the same frequency. We estimated that around 10,000 intolerant and tolerant genes negatively affect intelligence when deleted, and less than 2% have large effect sizes. Genes encompassed in CNVs were not enriched in any GOterms but gene regulation and brain expression were GOterms overrepresented in the intolerant subgroup. Such pervasive effects on cognition may be related to emergent properties of the genome not restricted to a limited number of biological pathways
Genome-wide association scan identifies new variants associated with a cognitive predictor of dyslexia
Developmental dyslexia (DD) is one of the most prevalent learning disorders, with high impact on school and psychosocial development and high comorbidity with conditions like attention-deficit hyperactivity disorder (ADHD), depression, and anxiety. DD is characterized by deficits in different cognitive skills, including word reading, spelling, rapid naming, and phonology. To investigate the genetic basis of DD, we conducted a genome-wide association study (GWAS) of these skills within one of the largest studies available, including nine cohorts of reading-impaired and typically developing children of European ancestry (N = 2562-3468). We observed a genome-wide significant effect (p <1 x 10(-8)) on rapid automatized naming of letters (RANlet) for variants on 18q12.2, within MIR924HG (micro-RNA 924 host gene; rs17663182 p = 4.73 x 10(-9)), and a suggestive association on 8q12.3 within NKAIN3 (encoding a cation transporter; rs16928927, p = 2.25 x 10(-8)). rs17663182 (18q12.2) also showed genome-wide significant multivariate associations with RAN measures (p = 1.15 x 10(-8)) and with all the cognitive traits tested (p = 3.07 x 10(-8)), suggesting (relational) pleiotropic effects of this variant. A polygenic risk score (PRS) analysis revealed significant genetic overlaps of some of the DD-related traits with educational attainment (EDUyears) and ADHD. Reading and spelling abilities were positively associated with EDUyears (p similar to [10(-5)-10(-7)]) and negatively associated with ADHD PRS (p similar to [10(-8)-10(-17)]). This corroborates a long-standing hypothesis on the partly shared genetic etiology of DD and ADHD, at the genome-wide level. Our findings suggest new candidate DD susceptibility genes and provide new insights into the genetics of dyslexia and its comorbities.Peer reviewe
- âŠ